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We study the problem of graph partitioning, or clustering, in sparse networks with prior in- 
formation about the clusters. Specifically, we assume that for a fraction p of the nodes the true 
cluster assignments are known in advance. This can be understood as a semi-supervised version 
of clustering, in contrast to unsupervised clustering where the only available information is the 
graph structure. In the unsupervised case, it is known that there is a threshold of the inter-cluster 
connectivity beyond which clusters cannot be detected. Here we study the impact of the prior in- 
formation on the detection threshold, and show that even minute [but generic] values of p > shift 
the threshold downwards to its lowest possible value. For weighted graphs we show that a small 
semi- supervising can be used for a non-trivial definition of communities. 
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Graph partitioning is an important problem with a 
wide range of applications in circuit design, social sci- 
ences, data mining, etc. In the context of social network 
analysis, this problem is better known as community de- 
tection, where community is loosely defined as a group of 
nodes so that the link density within the group is higher 
than across different groups. Many real- world networks 
have well-manifested community structure [l| , and much 
recent research has focused on developing community de- 
tection methods (see Q for a review). 

A common observed feature of many of those methods 
is the presence of a threshold in inter-community cou- 
ling beyond which communities cannot be detected [3j, 
This problem has been studied by formulating the 
community detection as a minimization of a certain 
Potts-Ising Hamiltonian 0, @ • It has been shown that 
that the graph partitioning problem is indeed charac- 
terized by a phase transition from detectable to unde- 
tectable regimes as one increases the coupling strength 
between the clusters 

Most work so far has considered unsupervised version 
of clustering, where the only available information is the 
graph structure. In many situations, however, one might 
have additional information about possible cluster assign- 
ments of certain nodes. This version of the problem has 
attracted recent interest in the machine learning com- 
munity, in the context of graph-based semi-supervised 
learning and classification (see [|| for a recent survey). 
In this setting, one assumes that, besides the graph 
structure, one also has some prior information about the 
cluster-membership of a set of nodes @, HI • Our purpose 
is to present a theoretical analysis of the semi-supervised 
version of the community detection problem for simple, 
bi-community networks, and to uncover new scenarios of 
community detection facilitated by semi-supervising. 

Model. Consider an Erdos-Renyi graph: each pair of 
nodes is linked with probability a/N, where N is the 
number of nodes in the graphs. Each link is given a 
weight J > 0. Now imagine a pair of such identical 
Erdos-Renyi graph, which models two clusters (commu- 
nities). Besides the intra-cluster J-links, each node in 



one graph is linked with probability j/N with nodes in 
the other graph. Each of these inter-cluster links is given 
a weight K > 0. For clarity, both J and K are assumed 
to be integer numbers. 

This planted bisection graph model will be employed 
for studying the performance of the cluster detection 
method, which places an Ising spin on each node and 
lets these spins interact via the network links [J, [5| : 
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where we have made the bi-cluster nature of the network 
explicit by introducing separate spin variables Sj = ±1 
and Si = ±1 (i — 1, . . . , N) for two clusters. Here Jy and 
Jij are identically and independently distributed random 
variables which assume zero with probability 1 — -ft and 
J > with probability %. Likewise, K. 
independently are equal to zero with probability 1 — X 



N 



M:) identically and 
pre 

and to K > with probability X. To enforce equiparti- 
tion, the Hamiltonian {1} will be studied under the con- 
straint 
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Thus, detecting the sign of a given spin Si at zero tem- 
perature (so as to exclude all thermal fluctuations) we 
can conclude to which cluster the corresponding node 
belongs: all spins having equal signs belong to the same 
cluster. The error probability p e for the cluster assign- 
ment is 

p c = (1 - M)A m = [(Si)r=o]av = -[(sj)T=o]av, (3) 

where m is the (single-cluster) magnetization, (. . .}t=o is 
the zero-temperature Gibbsian average, i.e. the average 
over all configurations of spins having in the thermody- 
namic limit the minimal energy given by {HE]), an< ^ wn ere 
[. . .] av is the average over the bi-graph structure, i.e., over 
{Jij}, {Jij} and {Kij}. As implied by the self-averaging 
feature, instead of taking [. . .] av , we can evaluate {si)T=o 
on the most probable bi-graph structure(s). 
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The above formulation refers to unsupervised commu- 
nity detection. The semi-supervising implies that for 
some (randomly distributed) nodes their cluster assign- 
ment is known in advance 0, H(. Thus, (Q]) is modified 

as 
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where fa (resp. ji) are identically and independently dis- 
tributed random variables that are equal to with prob- 
ability 1 — p and to oo (resp. — oo) with probability p. 
The constraint ((2|) is satisfied in the average sense. Thus, 
with respect of two randomly chosen sets of spins (each 
containing pN members) we know exactly to which clus- 
ter they belong, since fi — oo implies Sj — 1. Below we 
study the threshold of cluster detection with and without 
semi-supervising . 

Let P(h) denotes the probability of an internal field 
acting on one s-spin. According to the zero temperature 
cavity method, P(h) satisfies the following equation 

v^oo ^-^oo a n e~ a 7 m e~ 7 

p{h) = y y — j — x 
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k ^[h k ,J]-Y, k= ^[9k,K]) ,(5) 



where <p[a,b] = sign(a) min[ |a|, b], and where (resp. 
hk) are the fields acting on the s-spin from s-spin (resp. 
from other s-spins). These fields naturally enter with 

weight 7 1 (resp. ^-^ — ), which is the degree distri- 
bution of the corresponding Erdos-Renyi network. 

In ©p(/) = pS(f-oo) + (l-p)5(f) is the distribution 
of the frozen (supervising) field acting on s-spins. Due 
to {TH?]) and the complete inversion symmetry between 
the two clusters, we can take P(g) = P(—g), and then 
(|5|) is worked out via the Fourier representation of the 
delta-function yielding P(h) = pS(h — oo) + (1 — p)P(h), 
where P(h) refers to those s-spins, which were not di- 
rectly frozen by infinitely strong random fields: 



P{h) = e- a - 
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The physical order-parameters are expressed as [9| 



q — [ {si)T=o ]av 



where [. . .] av is now the average over the bi-graph struc- 
ture and the random fields. Recall that m defines the 
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m = [ {si} T=0 ] av = J dhP(h)siga(h), (7) 
d/iF(/i)sign 2 (/i), sign(0)=0, (8) 



error probability according to ([3]) . In ([8]) q differs from 1 
due to possible contribution oc 5(h) in P(h). Thus, 1 — q 
is the fraction of spins that do not have definite mag- 
netization, since they do not belong to the sub-graph 
of strongly connected spins [which exists above the per- 
colation threshold] , while q — m is the fraction of spins 
that do not have definite magnetization, because they 
are strongly frustrated, though they do belong to the 
sub-graph of strongly connected spins. 

Unweighted (J — K) unsupervised (p = 0) situation: 
Since at T — only the ratio J/K matters (and not the 
absolute values of J and K) we assume J = K = 1. 
Now the local fields can attain only integer values and 
the solution of J6]) is searched for as 



P(h)=Y c n S(h-n), 



(9) 



which upon substituting into ([6]) and using 

ef („»+«.+„-»-<.) = y°° i n ( x ) e - n y- mz , (10) 

' J n— — oo 

where I n (x) is the modified Bessel function, produces 



c n = e-^ + ^- ny I n (x), 



(11) 



x = \J (a + ~f) 2 q 2 — (a — j) 2 m 2 , y = atanh-^ \ — . 

[a +l)q 

This then implies via (0 El [9]) 

l-q = e-^+^I [x], (12) 
m = -2e- (Q+7)9 V Z n (a;)sinh ny . (13) 

^ — ^n— 1 

Eq. (I13p predicts a second-order transition, where m is 
the order-parameter. In the vicinity of the second-order 
phase transition one can expand (|121 113p over m: 



l-q = e-^+^I [(a + 1 )c 



1 = (a - 7 )(1 -<?) ! + 



h[(a + j)q} \ 



(14) 
(15) 



where we employed identities involving Bessel functions. 
We have m > (m = 0) if the RHS of (JT5J) is larger 
(smaller) than its LHS. 

Eq. (T5")) determines the detection threshold, above of 
which the method is capable of detecting clustering with 
better than random probability of error ([3]) . In the a — 7 
plane, the threshold line starts from (a = 1,7 = 0), 
see Fig. [TJ since (fl4| predicts a percolation bound for q: 
q = (q > 0) for a + 7 < 1 (a + 7 > 1). Naturally, close 
the percolation bound a = 1 , even very small intercluster 
coupling 7 nullifies m. Fig. Q] shows that at the detection 
threshold a > 7; moreover, the difference a — 7 at the 
threshold grows as ^/27r(a + 7) for a large a +7; see (|14l) . 
Thus, the ratio converges to zero for a large a +7. In 
this weak sense, the detection threshold converges to a = 
7 for large a + 7, while for any finite a the unsupervised 
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TABLE I: Weighted situation: 2J = K = 2. For 7 = 1 
and various semi-supervising degrees p we list the clustering 
threshold a and the values of q and mi at this threshold. 



FIG. 1: The phase diagram for J = K = 1. The line on the 
(a, 7) plane indicates second-order phase-transition from m = 
(no clustering detection) to m > (clustering detection). 



FIG. 2: Normal curve: magnetization m versus a for 7 = 
1. m undergoes second-order phase-transition at a = 3.4. 
Dashed curves: remnant [semi-supervised] magnetization fh 
versus a for 7 = 1. From top to bottom: p — 0.2, 0.05, 0.01. 



clustering detection threshold lies below the line = 7; 
see Fig. Q] 

Under semi- supervising we still employ (j9"l ll0[) and ob- 
tain (fTTl [T2l [TB")) . but now in the RHS of these equa- 
tions one should substitute to — > p + (1 — p)m and 
q — > p + (1 — jo);?. Expanding over a small m we get 

1 - q = e-^W-'Mlofta + 7 )(p + [1 - p]g)], (16) 
Ji[(a + 7)(P+ [!-/>]?)]' 



?7i = p(a - 7)(1 - q) 



I [(a + -y)(j> + [1 - p]q)] 



Now to > for any a — 7 > 0. This is the average- 
connectivity threshold, which for the considered un- 
weighted scenario is the only possible definition of clus- 
tering. Thus, any generic semi-supervising leads to the 
theoretically best possible threshold a = 7; see Fig. [2] 

Note also that for p > 0, Q16p has a non-trivial solution 
q < 1 for arbitrary a + 7 > 0. 

The weighted situation J ^ K will be studied via two 
particular (but important) cases 2J = K = 2 and 2K = 
J = 2, to make them amenable to analytic approach. 
Putting it into ((5]) and using (fT0|) two times, we see that 
there are now four order-parameters: 



Note that only q and to are oberved from the single-spin 
statistics, see ([Zl E^i and "*i can be observed only via 
measuring the internal field distribution P(ti). 

For 2J = K = 2wc introduce the following notations 

Eoo 
I n (u)I p -2n(x) cosh[2yn - vn - yp], 
n— — 00 

Eoo 
I n (u)Ip-2n{x) sinh [2yn -vn — yp], 
n= — oo 



2e -(«+7)9 ; z ± = q ± TO; Z f = qi±m 1 , f 



2p 



2 = (1 - p)\J(az- + iz+){a[z+ + £] + 7 z x ), (18) 



1, , . 

, v=-hx (19) 

2 + Q + •yz l 2 z - z 1 



1, az + 7Z n 
y=-in 



u = 7 (l-p)yj(z- -zf)(z+ +0, (20) 
and write down the order-parameter equations: 



C p (q,m), q 1 = C 1 (q,m), (21) 

p— 1 

S p (q,m), m 1 = S 1 (q,m). (22) 



771, 



9: 



51 



Cl + C_l, 



TOl 



ci-c_i. (17) 



Eqs. (dUUU) apply also for 2K = J = 2, but now in Ipf 
l2T)f we should interchange a and 7, and then substitute 
y — > — y and w — ► — v. Eqs. (f2"Tl |2"2"|) predict a second- 
order transition over to and mi- Similarly to the unsu- 
pervised case, the threshold of this transition is found via 
expanding (|211 122p over to and mi. But the real qualita- 
tive differences between weighted and unweighted situa- 
tions show up under semi-supervising, which we consider 
in more details below. 

First we focus on 2J = K = 2 and recall that the 
clustering threshold is defined via m = 0. While for 
the previous unweighted situation, any amount of semi- 
supervision (as quantified by p) sufficed for shifting the 
clustering threshold to a p-independent value, here the 
detection threshold starts to depend on p, and the small- 
est threshold is achieved for p — > 0; see Table 1 for 
an illustration. To understand this seemingly counter- 
intuitive observation, note that detection threshold is 
achieved as a balance between the intercluster links [with 
the average connectivity 7 and weight K = 2] that — due 
to negatively frozen spins — exert negative fields on the 
test spin and the intracluster links [with the average con- 
nectivity a and weight J = 1] that exert positive fields. 
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p || 0.005 | 0.1 | 0.4 | 0.8 | 0.999 | 

a II 0.6706 I 0.6481 I 0.6259 1 0.6170 1 0.6142 [ 2.3922 
g 0.5099 0.6197 0.7755 0.8616 0.8864 0.8163 

mi I -0.002 1 -0.0284 1 -0.0709 1 -0.1 |-0.1094| 

TABLE II: The same as for Table E but for the weighted 
situation: J = 2K — 2. 

A larger p facilitates the negative fields, since they have 
twice larger weight, which explains why vanishing semi- 
supervising p — > facilitates a lower detection threshold. 

Another interesting observation is is that — contrary to 
the unsupervised situation, where m and mi simultane- 
ously turn to zero at the threshold — we get mi > at the 
semi-supervised threshold. Thus, some memory about 
the clustering is conserved despite the fact that m = 0; 
see Table 1. Since mi cannot be observed via a single 
spin, this memory is hidden. The reason of mi > is 
that mi counts the internal fields equal to ±1, and there 
are more such fields coming from the intracluster [con- 
nectivity a, weight 1] links that exert positive fields due 
to the semi-supervised (frozen) spins. 

Now consider perhaps the most paradoxical aspect of 
the semi-supervised detection threshold: it is smaller 
than the value deduced from balancing the cumulative 
wights of intracluster and intercluster links, which yields 
q J = jK. Indeed, according to Table [J (where 7=1) 
we have a = 1.5 (reached for p — > 0) versus the weight- 
balancing value a = 2. This result seemingly contradicts 
the intuition we got so far: i) a rough intuition about 
Hamiltonian (JTJ) is that it is based on defining a cluster 
via the intracluster weight being larger than the inter- 
cluster weight, ii) The unsupervised threshold is well 
above the weight-balancing prediction; see Table |TJ Hi) 
In the unweighted case (J = K) the semi-supervising just 
reduces the detection threshold towards a — 7, which co- 
incides with the weight-balancing value. 

To understand this effect, we turn to the physical pic- 
ture of the threshold, where positively and negatively 
acting links driven by the semi-supervised (frozen) spins 
compensate each other. At the weight-balance aJ = jK 
(with J < K) fewer (but stronger) intercluster links have 
the same weight as more numerous (but weaker) intra- 
cluster links. Since the intracluster links are more nu- 
merous, their overall effect on a (randomly chosen) test 
spin is more deterministic and hence capable of building 
up a positive m at aJ — ^K. Thus, the actual threshold 
is reached for a J < jK. 

We thus conclude that for weighted graph K > J a 



small [but generic] semi-supervising can be employed for 
defining the very clustering structure. This definition 
is non-trivial, since it performs better than the weight- 
balancing definition. Indeed, for a weighted network the 
definition of detection threshold is not clear a priori, in 
contrast to unweighted networks, where the only possi- 
ble definition goes via the connectivity balance a = 7. 
To illustrate this ambiguity, consider a node connected 
to one cluster via few heavy links, and to another clus- 
ter via many light links. To which cluster this node 
should belong in principle? Our answer is that the proper 
cluster assignment in this case can be defined via semi- 
supervising. 

It is interesting to calculate m at the weight-balancing 
value aJ = jK, since this is the semi-supervising benefit 
of those who would insist on the weight-balancing defi- 
nition of the threshold; see Table [T] Note finally that for 
large values of 7 both unsupervised and semi-supervised 
thresholds converge to aJ = jK, since now fluctuations 
are irrelevant from the outset. 

All these effects turn upside-down for 2K = J = 2; 
see Table [TTJ Now the threshold is minimized for the 
maximal semi-supervising p — > 1, mi is negative at the 
threshold — and thus the memory about the clustering 
is contained in m — mi > — and the semi-supervised 
detection threshold a is always larger than the weight- 
balancing value jK/ J. These results are explained by 
"inverting" the above arguments developed for J < K. 

In conclusion, we analyzed the community detection 
in semi-supervised settings, where one has prior informa- 
tion about the community assignments of certain nodes. 
We showed that for the planted bisection graph model 
with intracluster and intercluster average connectivities 
a and 7, respectively, even a tiny (but finite) semi- 
supervising shifts the detection threshold to its intuitive 
value a — 7. We observed a similar effect of lowered 
detection threshold for weighted graphs. In contrast to 
the unweighted case, the shift in this case depends on the 
degree of supervision. Furthermore, we found that when 
approaching the unsupervised limit by having p — > 0+, 
the detection threshold converges to a value lower (bet- 
ter) from the one obtained via balancing intracluster and 
intercluster weights. We suggest that this can serve as 
an alternative definition of clusters. We also saw that in 
the semi-supervised case some (hidden) memory on the 
clustering survives at the detection threshold. 
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